{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Model summary\n",
    "\n",
    "Get familiar with your model, by examining its structure and properties.\n",
    "\n",
    "Use this notebook to display statics and information about the weights, layers and connectivity of the model.<br>\n",
    "\n",
    "## Table of Contents\n",
    "\n",
    "1. [Choose which model you want to examine](#Choose-which-model-you-want-to-examine)\n",
    "2. [Print a summary of the statistics of the model attributes in tabular format](#Print-a-summary-of-the-statistics-of-the-model-attributes-in-tabular-format)<br>\n",
    "    2.1. [Display some information about the layer types](#Display-some-information-about-the-layer-types)<br>\n",
    "    2.2. [Compare weights footprint to feature-map footprint](#Compare-weights-footprint-to-feature-map-footprint)<br>\n",
    "    2.3. [Compare data footprint to compute (MACs)](#Compare-data-footprint-to-compute-(MACs)\n",
    "3. [Filter L1-norm](#Filter-L1-norm)\n",
    "4. [References](#References)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "import matplotlib\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "    \n",
    "from distiller.model_summaries import *\n",
    "from distiller.models import create_model\n",
    "from distiller.apputils import *\n",
    "import torch\n",
    "import torchvision\n",
    "import qgrid\n",
    "\n",
    "# Load some common jupyter code\n",
    "%run distiller_jupyter_helpers.ipynb\n",
    "import ipywidgets as widgets\n",
    "from ipywidgets import interactive, interact, Layout\n",
    "\n",
    "# Some models have long node names and require longer lines\n",
    "from IPython.core.display import display, HTML\n",
    "display(HTML(\"<style>.container { width:100% !important; }</style>\"))\n",
    "\n",
    "def pretty_int(i):\n",
    "    return \"{:,}\".format(i)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Choose which model you want to examine\n",
    "\n",
    "If you are studying the structure of a neural network model, you probably don't need a pruned model, although you can use one.\n",
    "<br>\n",
    "In this example, we look at a pretrained ResNet18 model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = 'imagenet'\n",
    "dummy_input = torch.randn(1, 3, 224, 224)\n",
    "arch = 'resnet18'\n",
    "#arch = 'alexnet'\n",
    "checkpoint_file = None \n",
    "\n",
    "if checkpoint_file is not None:\n",
    "    model = create_model(pretrained=True, dataset=dataset, arch=arch)\n",
    "    load_checkpoint(model, checkpoint_file)\n",
    "else:\n",
    "    model = create_model(pretrained=True, dataset=dataset, arch=arch, parallel=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Print a summary of the statistics of the model attributes in tabular format\n",
    "\n",
    "Distiller generates several different summary reports, which are returned as Pandas dataframes which you can slice, dice and sort using Pandas' rich API.<br>\n",
    "<br>\n",
    "MACs are multiply-accumulate operations: a MAC unit computes the product of two elements and adds the product to an accumulator.  The MACs reported by distiller.model_performance_summary are for direct GEMM (General Matrix-Matrix Multiplication) and convolution.  Different hardware uses specific algorithms at different times.  For example, [Intel's MKL-DNN](https://intel.github.io/mkl-dnn/) uses [Winograd](https://arxiv.org/pdf/1509.09308.pdf) for 3x3 convolutions.  As another example, [convolutions are sometimes computed using GEMM](https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/) for increased utilization of vectorized hardware.<br>\n",
    "\n",
    "\n",
    "<br>\n",
    "In the example below, we display some statistics about the sizes and shapes of the feature-maps and weight tensors, and some other goodies. :-)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "df = distiller.model_performance_summary(model, dummy_input, 1)\n",
    "\n",
    "# You can display summaries using several backends, and each has its advantages and disadvantages, so you will want to use them in different situations.\n",
    "print(\"Weights shapes, sizes and statistics (showing only FC and convolution layers):\")\n",
    "print(\"\\tTotal IFM footprint (elements): \" + \"{:,}\".format(df['IFM volume'].sum()))\n",
    "print(\"\\tTotal OFM footprint (elements): \" + \"{:,}\".format(df['OFM volume'].sum()))\n",
    "print(\"\\tTotal weights footprint (elements): \" + \"{:,}\".format(df['Weights volume'].sum()))\n",
    "    \n",
    "# 1. As a textual table\n",
    "#t = distiller.model_performance_tbl_summary(model, dummy_input, 1)\n",
    "#print(t)\n",
    "\n",
    "# 2. As a plain Pandas dataframe\n",
    "# display(df)\n",
    "\n",
    "# 3. As a QGrid table, which you can sort and filter.\n",
    "\n",
    "qgrid.show_grid(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Display some information about the layer types\n",
    "Gleaning model statistics using Pandas dataframes, provides a painless way to query 2nd level details about the model, such as what layer types it uses."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "conv7x7 = df[df['Attrs'] == 'k=(7, 7)']\n",
    "conv3x3 = df[df['Attrs'] == 'k=(3, 3)']\n",
    "conv1x1 = df[df['Attrs'] == 'k=(1, 1)']\n",
    "\n",
    "print(\"There are %d Conv(7,7) layers with total MACs = %s\" % (len(conv7x7), pretty_int(conv7x7['MACs'].sum())))\n",
    "print(\"There are %d Conv(3,3) layers with total MACs = %s\" % (len(conv3x3), pretty_int(conv3x3['MACs'].sum())))\n",
    "print(\"There are %d Conv(1,1) layers with total MACs = %s\" % (len(conv1x1), pretty_int(conv1x1['MACs'].sum())))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compare weights footprint to feature-map footprint\n",
    "\n",
    "Memory footprint, bandwidth and throughput are different concepts.  Footprint is the size amount of memory required to store a piece of data (e.g. measured as number of bytes).  Bandwidth is the rate at which data can be read or written (stored) from/to memory by different hardware (e.g. measured as bytes/sec).  Throughput is a measure of the data that actually moves (read/stored) in a period of time (bytes/sec).<br>\n",
    "Because the amount of data required for a typical neural-network operation is often larger than the available working memory of the compute hardware (e.g. CPU registers and cache), data often needs to be sliced into tiles (blocks).  The sizes of the tiles, together with the memory access pattern and the compute algorithm, determine the total amount of data that needs to move around (read/stored).  Because of this hardware dependency, we provide below information regarding memory footprint and not throughput."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "fig, ax = plt.subplots(figsize=(15,7.5))\n",
    "fig.suptitle(\"Foortprint Statistics by layer\")\n",
    "ax.set_ylabel(\"Feature Maps\")\n",
    "ax.set_xlabel(\"Layer\")\n",
    "ax2 = ax.twinx()\n",
    "ax2.set_ylabel(\"Weights\")\n",
    "ax.set_xticklabels(df.Name, rotation=90);\n",
    "\n",
    "df[\"FM volume\"] = df[\"OFM volume\"] + df[\"IFM volume\"]\n",
    "df[[\"Name\",\"FM volume\"]].plot(ax=ax, xticks=range(len(df.index)), style=\"b-\", rot=90)\n",
    "df[[\"Name\",\"Weights volume\"]].plot(ax=ax2, style=\"g-\", use_index=True, rot=90);\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compare data footprint to compute (MACs)\n",
    "\n",
    "We measure Footprint in number of elements, not bytes.  If, for example, the elements data type is FP32, then the real footprint is 4x the reported footprint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=(15,7.5))\n",
    "fig.suptitle(\"Foortprint vs. Compute\")\n",
    "ax.set_ylabel(\"MACs\")\n",
    "ax.set_xlabel(\"Layer\")\n",
    "ax2 = ax.twinx()\n",
    "ax2.set_ylabel(\"Footprint\")\n",
    "\n",
    "df[[\"Name\", \"MACs\"]].plot(ax=ax, kind='bar', rot=90,  xticks=range(len(df.index)), figsize=[15,7.5])\n",
    "ax.set_xticklabels(df.Name, rotation=90);\n",
    "\n",
    "df2 = df[\"Weights volume\"] + df[\"OFM volume\"] + df[\"IFM volume\"]\n",
    "df2.plot(ax=ax2, style=\"g-\", use_index=True, rot=90);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Filter L1-norm\n",
    "\n",
    "Draw the L1 norm of each filter, in a selected weight tensor.<br>\n",
    "When ranking filters by L1-norm (as in [Pruning filters for efficient convnets](#Hao-et-al-2016)), this can provide some insight as to which filters will be removed.<br>\n",
    "Make sure you've loaded a pretrained network, otherwise you will be looking at random data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "params_names = conv_param_names(model)\n",
    "\n",
    "def view_weights(pname, sort):\n",
    "    param = model.state_dict()[pname]\n",
    "    view_filters = param.view(param.size(0), -1)\n",
    "    filter_mags = to_np(view_filters.abs().mean(dim=1))\n",
    "    if sort:\n",
    "        filter_mags = np.sort(filter_mags)\n",
    "    plt.figure(figsize=[15,7.5])\n",
    "    plt.plot(range(len(filter_mags)), filter_mags, label=pname, marker=\"o\", markersize=5, markerfacecolor=\"C1\")\n",
    "    plt.xlabel('Filter index (i.e. output feature-map channel)')\n",
    "    plt.ylabel('Fliter L1-norm')\n",
    "\n",
    "sort_choice = widgets.Checkbox(value=True, description='Sort filters')\n",
    "params_dropdown = widgets.Dropdown(description='Weights:', options=params_names, layout=Layout(width='40%'))\n",
    "interact(view_weights, pname=params_dropdown, sort=sort_choice);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "\n",
    "<div id=\"Gray-et-al-2015\"></div> **Andrew Lavin and Scott Gray**. \n",
    "    [*Fast Algorithms for Convolutional Neural Networks*](https://arxiv.org/pdf/1509.09308.pdf),\n",
    "    2015.\n",
    "<div id=\"Hao-et-al-2016\"></div> **Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf**. \n",
    "    [*Pruning filters for efficient convnets*](https://arxiv.org/abs/1608.08710),\n",
    "    2016."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
